Preview dataset #1288

Huongg · 2023-03-15T17:42:46Z

Description

Fixes for #907. For now, every time the user clicks on CSV, Excel or Parquet Dataset in viz, it will load the first 40 rows in the metadata panel

Design:
Link to the journey and behaviour description
Link to the metadata side panel

Development notes

To test this locally, ensure you clone and pull the latest changes from this branch https://github.com/kedro-org/kedro-plugins/tree/preview-csv-dataset

Then cd to location of kedro-plugins/kedro-datasets, then run pip install -e . To check, run pip list to see if kedro-dataset and kedro-viz are pointed to your local machine
Once done, go back to kedro-viz/demo-project run kedro run then kedro viz. It should show you all the changes from both repos.

WIP: tests to be added to cover both changes, here and in kedro-plugins

Screen.Recording.2023-03-16.at.11.02.58.mov

QA notes

Checklist

Read the contributing guidelines
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added new entries to the RELEASE.md file
Added tests to cover my changes

Signed-off-by: huongg <[email protected]>

…taset Signed-off-by: huongg <[email protected]>

package/kedro_viz/models/flowchart.py

Signed-off-by: huongg <[email protected]>

package/kedro_viz/models/flowchart.py

Signed-off-by: huongg <[email protected]>

datajoely · 2023-03-20T13:24:03Z

package/kedro_viz/models/flowchart.py

@@ -20,6 +20,9 @@

 logger = logging.getLogger(__name__)

+PREVIEW_DATASETS = ["pandas.csv_dataset.CSVDataSet",
+                    "pandas.parquet_dataset.ParquetDataSet", "pandas.excel_dataset.ExcelDataSet"]


Supporting Spark and launch would be a really nice stretch goal, but not a dealbreaker

datajoely · 2023-03-20T13:25:07Z

package/kedro_viz/models/flowchart.py

+        if self.type in PREVIEW_DATASETS:
+            # If the kedro-datasets is on the latest and does have the _preview
+            if (hasattr(dataset, '_preview')):
+                self.preview = dataset._preview(40)


Why not make this configurable here?

Based on our telemetry, hardly anyone uses this.

That's interesting. I would argue this is a slightly more compelling thing for users to actually change, but it's also something we can wait to see if users start asking for :)

this is now done @datajoely 😄 I just included this today, by default this feature is always on unless user chooses to change it

Is it toggled or can they pass the preview number of rows?

it is just toggled on and off at the moment, with the number of rows sets to 40

datajoely

Hugely excited about this :)

Signed-off-by: huongg <[email protected]>

antonymilne

Great work @Huongg! ⭐

I have many comments, but most importantly I think you should test three things before you make any changes:

what happens if you change your code in _preview in kedro-datasets to raise an exception (easiest way is to just put in the code 1/0?
what happens if you run it on a dataset that's pandas.CSVDataSet but the file doesn't exist?
what happens if you run it on a dataset that's `pandas.CSVDataSet but with some data missing?

I suspect the first two of these will result in no metadata panel loading for the dataset and that the 3rd test case will work.

Assuming I'm right here, you should then:

Write a test that fails test case 2
Make the changes I suggest
Test the cases that failed before and hopefully they pass now

I have more ideas for where we should go with this feature, but I'll post it on a separate issue. I think we're also going to see cases which aren't currently handled well e.g. if the rows have labels as well as columns, but I'm happy to release this as an MVP for now.

package/kedro_viz/models/flowchart.py

package/tests/test_models/test_flowchart.py

src/components/metadata/metadata.js

Signed-off-by: huongg <[email protected]>

antonymilne

Really nice work 🎉

Just to double check, what now happens in these three cases?

change your code in _preview in kedro-datasets to raise an exception (easiest way is to just put in the code 1/0)?
a dataset that's pandas.CSVDataSet but the file doesn't exist?
a dataset that's pandas.CSVDataSet but with some data missing?

antonymilne · 2023-03-23T17:22:11Z

RELEASE.md

@@ -13,6 +13,7 @@ Please follow the established format:
 - Remove metrics plots from metadata panel and add link to the plots on Experiment tracking. (#1268)
 - Link plot and JSON dataset names from experiment tracking to the flowchart. (#1165)
 - Bump minimum version of React from 16.8.6 to 17.0.2. (#1282)
+- Show preview of data in metadata panel. (#1288)


Might be worth putting this at the top of the list since to users it's much more interesting than the other changes 😀 (maybe the react version point should go under bug fixes and other changes?)

Maybe also worth saying "preview of pandas.CSVDataSet and pandas.ExcelDataSet" too.

hey i like the idea of including the "preview of pandas.CSVDataSet and pandas.ExcelDataSet" here too. Even though it will also mention it in our Release highlight in the UI, I guess no harm to mention here again.

I think the React version might be a major change actually but maybe @tynandebold can confirm this?

The React version change will make this a major release, so probably ok to leave it where it is.

src/components/metadata-modal/metadata-modal.js

Signed-off-by: huongg <[email protected]>

Huongg · 2023-03-23T18:04:59Z

Really nice work 🎉

Just to double check, what now happens in these three cases?

change your code in _preview in kedro-datasets to raise an exception (easiest way is to just put in the code 1/0)?

a dataset that's pandas.CSVDataSet but the file doesn't exist?

a dataset that's pandas.CSVDataSet but with some data missing?

hey @AntonyMilneQB thank you. So to confirm:

it will throw error as .... could not be previewed. Full exception: ZeroDivisionError: division by zero
it will also throw error

'reviews' could not be previewed. Full exception: DataSetError: Failed while loading data from data set                              flowchart.py:600
CSVDataSet(filepath=/Users/Huong_Nguyen/Documents/dev/kedro-viz/demo-project/data/01_raw/reviews.csv, load_args={'nrows': 40},protocol=file, save_args={'index': False}). No columns to parse from file

this one works normally, it shows some empty field in the table as the screenshot below

Are these what you expected?

Signed-off-by: huongg <[email protected]>

antonymilne · 2023-03-23T21:15:00Z

Yes, all as I expected thank you! 👍

rashidakanchwala

Amazing!!!!!! Thanks Huong!

Huongg added 8 commits March 13, 2023 17:04

set up preview in response and flowchart.py

007f7c2

Signed-off-by: huongg <[email protected]>

return preview for metaData

0ff46cf

Signed-off-by: huongg <[email protected]>

filter data and render the table

e469560

Signed-off-by: huongg <[email protected]>

add preview-table component

38e641c

Signed-off-by: huongg <[email protected]>

fix styling for table in metadata panel

4fd5136

Signed-off-by: huongg <[email protected]>

add previewTable to metadata-modal

6ca4ec4

Signed-off-by: huongg <[email protected]>

different theme for preview table

36931d0

Signed-off-by: huongg <[email protected]>

add classname for preview-table

2abe3a6

Signed-off-by: huongg <[email protected]>

Huongg mentioned this pull request Mar 16, 2023

preview-csv-dataset kedro-org/kedro-plugins#129

Merged

4 tasks

Huongg added 6 commits March 16, 2023 12:28

add size for PreviewTable

85dda7c

Signed-off-by: huongg <[email protected]>

add cursor pointer

6aa6b6e

Signed-off-by: huongg <[email protected]>

styling for row hover state

9a6eae7

Signed-off-by: huongg <[email protected]>

light and dark theme for hovering

dc19dd1

Signed-off-by: huongg <[email protected]>

include excel and parquet dataset

d41b4f6

Signed-off-by: huongg <[email protected]>

Merge branch 'main' of github.com:kedro-org/kedro-viz into preview-da…

5d775cf

…taset Signed-off-by: huongg <[email protected]>

Huongg marked this pull request as ready for review March 17, 2023 09:53

Huongg requested review from rashidakanchwala and tynandebold as code owners March 17, 2023 09:53

rashidakanchwala reviewed Mar 17, 2023

View reviewed changes

package/kedro_viz/models/flowchart.py Outdated Show resolved Hide resolved

Huongg added 6 commits March 17, 2023 12:22

add margin for table large

6e143f9

Signed-off-by: huongg <[email protected]>

check if self.type is in preview_datasets

de652bd

Signed-off-by: huongg <[email protected]>

remove preview table style

20bcefd

Signed-off-by: huongg <[email protected]>

add preview text

6a90f53

Signed-off-by: huongg <[email protected]>

Adding test for preview table

a7c07b0

Signed-off-by: huongg <[email protected]>

adding condition to not call _preview if not existed

e7b1a56

Signed-off-by: huongg <[email protected]>

rashidakanchwala reviewed Mar 20, 2023

View reviewed changes

package/kedro_viz/models/flowchart.py Outdated Show resolved Hide resolved

move the hasAttr inside the previewDataSet loops

2ce39d0

Signed-off-by: huongg <[email protected]>

datajoely reviewed Mar 20, 2023

View reviewed changes

Huongg requested a review from rashidakanchwala March 20, 2023 15:49

Merge branch 'main' into preview-dataset

15e61d4

antonymilne self-requested a review March 21, 2023 10:52

Huongg added 4 commits March 21, 2023 11:14

ignore updateContent file

9869229

Signed-off-by: huongg <[email protected]>

formatting

bc7a12c

Signed-off-by: huongg <[email protected]>

test_data_node_metadata_preview for flowchart.py

a772729

Signed-off-by: huongg <[email protected]>

remove parquet dataset preview

ed13b44

Signed-off-by: huongg <[email protected]>

antonymilne reviewed Mar 21, 2023

View reviewed changes

package/kedro_viz/models/flowchart.py Outdated Show resolved Hide resolved

package/tests/test_models/test_flowchart.py Outdated Show resolved Hide resolved

src/components/metadata/metadata.js Outdated Show resolved Hide resolved

Huongg added 10 commits March 22, 2023 08:10

restructure data in preview table

bc25c3e

Signed-off-by: huongg <[email protected]>

update name to onExpandMetaDataClick

e446fdd

Signed-off-by: huongg <[email protected]>

restructure the flowchart.py for _preview

c726a7b

Signed-off-by: huongg <[email protected]>

update test to reflect the new data structure

36b7c66

Signed-off-by: huongg <[email protected]>

adding test for flowchart.py

3658109

Signed-off-by: huongg <[email protected]>

downgrade mypy package

538cf9c

Signed-off-by: huongg <[email protected]>

formatting

63b0b7d

Signed-off-by: huongg <[email protected]>

include # pylint: disable=attr-defined for _preview

4f558c7

Signed-off-by: huongg <[email protected]>

include # type: ignore

8d5ed2a

Signed-off-by: huongg <[email protected]>

include ignore in response

1244076

Signed-off-by: huongg <[email protected]>

Huongg requested a review from antonymilne March 23, 2023 09:01

Huongg added 2 commits March 23, 2023 09:57

update wrong test name

e3f7924

Signed-off-by: huongg <[email protected]>

sticky header for preview table

9b89c6c

Signed-off-by: huongg <[email protected]>

antonymilne approved these changes Mar 23, 2023

View reviewed changes

feature flag and connect the preview data with config table

0455310

Signed-off-by: huongg <[email protected]>

use more generic classname for metadata modal

3d54783

Signed-off-by: huongg <[email protected]>

rashidakanchwala approved these changes Mar 24, 2023

View reviewed changes

Huongg merged commit f8c2be5 into main Mar 24, 2023

Huongg deleted the preview-dataset branch March 24, 2023 12:53

tynandebold mentioned this pull request Mar 24, 2023

Release v6.0.0 #1297

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preview dataset #1288

Preview dataset #1288

Huongg commented Mar 15, 2023 •

edited

Loading

datajoely Mar 20, 2023

datajoely Mar 20, 2023 •

edited

Loading

tynandebold Mar 20, 2023

datajoely Mar 20, 2023

Huongg Mar 23, 2023 •

edited

Loading

datajoely Mar 23, 2023

Huongg Mar 23, 2023

datajoely left a comment

antonymilne left a comment

antonymilne left a comment

antonymilne Mar 23, 2023

Huongg Mar 23, 2023

tynandebold Mar 24, 2023

Huongg commented Mar 23, 2023

antonymilne commented Mar 23, 2023

rashidakanchwala left a comment

Preview dataset #1288

Preview dataset #1288

Conversation

Huongg commented Mar 15, 2023 • edited Loading

Description

Development notes

QA notes

Checklist

Choose a reason for hiding this comment

datajoely Mar 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huongg Mar 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datajoely left a comment

Choose a reason for hiding this comment

antonymilne left a comment

Choose a reason for hiding this comment

antonymilne left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huongg commented Mar 23, 2023

antonymilne commented Mar 23, 2023

rashidakanchwala left a comment

Choose a reason for hiding this comment

Huongg commented Mar 15, 2023 •

edited

Loading

datajoely Mar 20, 2023 •

edited

Loading

Huongg Mar 23, 2023 •

edited

Loading